BMC Research Notes — Latest Matching Preprints

1

Impact of AI-Assisted Mammography Reading on Quality Indicators in the Czech Breast Cancer Screening Programme: A Retrospective Study

Veverkova, L.; Dolezalova, Z.; Marackova, V.; Mathew, E.; Urbankova, M.; Ambrozova, M.; Piskovsky, T.; Ngo, O.; Majek, O.

2026-05-26 oncology 10.64898/2026.05.25.26353869 medRxiv

Top 0.1%

2.2%

Show abstract

Objectives: The aim of mammographic screening is the early detection of invasive cancers. In the era of artificial intelligence (AI), this tool may improve diagnosis of earlier stages. The purpose of this study was to assess the impact on selected quality indicators retrospectively. Method: The data source was the Breast Cancer Screening Registry using data from one Screening Unit that currently uses AI routinely. The indicators of the cancer detection rate (CDR), further assessment rate (FAR), and recall rate (RR) in the year 2023, when AI was used, and the year 2022, without AI, in women aged 45-69 were compared. The statistical evaluation used the chi-square test and logistic regression adjusting for the effects of age, a woman's risk level, and the screening round at a 5% significance level. Results: In 2022, without AI, 4,034 women aged 45-69 were included, compared with 4,049 women in 2023 when AI was used. This study showed a non-significant increase in CDR from 5.0 breast cancers detected per 1,000 women (non-AI assessment) to 5.2 (AI-assisted assessment), p = 0.919; OR (95% CI): 1.034 (0.542-1.974), a significant decrease in the FAR from 5.2% to 3.9%, p < 0.001; OR (95% CI): 0.665 (0.529-0.836), and a decrease in RR from 2.4% to 1.9%, p = 0.083; OR (95% CI): 0.754 (0.548-1.037). Conclusion: AI has the potential to be a useful tool in the early detection of breast cancer by improving quality through a decrease in FAR and RR, while probably maintaining CDR.

2

Assessing the Impact of Interventions on Tuberculosis Control: India Based Modelling Framework

Raj, Y. A.; Parthasarathy, R.; Mitra, M. K.; Mehra, S.

2026-05-22 epidemiology 10.64898/2026.05.20.26353466 medRxiv

Top 0.1%

2.1%

Show abstract

Background India accounts for nearly one-fourth of the global tuberculosis (TB) burden. The country's progress towards elimination of TB is hindered by considerable heterogeneity in behavioural, social, and health system determinants, which influence transmission dynamics and care access. Evidence from the recent national TB prevalence survey showed that almost half of individuals with active disease were asymptomatic, underscoring the limitations of symptom -based case finding. Achieving the End TB targets will therefore require strategies that simultaneously address the substantial pool of individuals with undiagnosed, asymptomatic disease and those symptomatic individuals who do not seek care. Methods We developed a transmission model of TB that explicitly incorporates individuals with asymptomatic disease, and those who do not seek care. Model calibration was performed within a Bayesian framework using epidemiological and programmatic data for India. The calibrated model was then used to project the potential impact of intervention on TB incidence and mortality. Results Under the baseline scenario, the estimated TB incidence and mortality rates for 2024 were 180 (163-203) and 24 (18-31) per 100,000 population, respectively. Across all intervention scenarios targeting improved diagnosis, active case finding, nutrition support and their combination the reduction in incidence rate by 2030 ranged from 13% to 60% compared with 2025, while the corresponding decline in mortality rate ranged from 16% to 66%. Conclusion While individual interventions yield measurable reductions in TB incidence and mortality, but greater impact is achieved when implemented in combination reflecting the need for a comprehensive, multi-component response towards TB elimination.

3

Effectiveness of RMSSD-Based Adaptive Music Therapy (Skitii) in Reducing Treatment-Related Anxiety in Head and Neck Cancer Patients: Protocol for a Randomized Controlled Trial

Adhikari, P.; M, D.; Subramanium, V.; Krishna, T.; B, A.; Jain, C. B.

2026-05-15 oncology 10.64898/2026.05.13.26353099 medRxiv

Top 0.1%

1.7%

Show abstract

Background: Head and neck cancer (HNC) patients experience clinically significant anxiety and depression in 65-85% of cases during active treatment. Current supportive care lacks personalized, real-time non-pharmacological interventions. Skitii is a novel HRV-adaptive music therapy system that uses continuous RMSSD (root mean square of successive differences) monitoring via a Polar H10 chest sensor to select music in real-time, targeting parasympathetic recovery (RMSSD >=30ms). Methods: This is a prospective, open-label, randomized controlled trial (1:1 allocation) at Yenepoya Medical College Hospital, Mangalore, India. Adults aged 18-75 years with confirmed head and neck cancer (any subsite, Stage I-IV) undergoing radiotherapy and/or chemotherapy with baseline distress (HADS >=8 or NCCN Distress Thermometer >=4) will be enrolled. Participants are randomized to Skitii adaptive music therapy (20-minute sessions, 3 times daily, 3 weeks) or static music therapy control. Skitii uses a two-phase algorithm: Phase 1 (0-2.5 minutes) uses heart rate as a stress proxy for immediate music selection; Phase 2 (2.5-20 minutes) uses RMSSD to adapt music every 2.5 minutes when physiological state changes by >=20%. Primary endpoints are HADS-Anxiety score and resting RMSSD at Week 3. Sample size is 70 (35 per arm), powered at 80% to detect a 2.5-point HADS difference (SD=3.8, alpha=0.05, 15% dropout). Analysis is ANCOVA, intent-to-treat. Discussion: This is the first randomized controlled trial evaluating RMSSD-based adaptive music therapy in cancer patients. The active control design isolates the effect of the adaptive algorithm from music exposure alone. If positive, results will support a scalable, cost-effective supportive care intervention with objective physiological monitoring, and provide the clinical evidence base for CDSCO Class B medical device approval for Skitii in India, with future CE Mark and FDA applications planned. Trial Registration: Clinical Trials Registry - India CTRI CTRI/2025/11/116732

4

Metastatic Patterns and Treatment Characteristics of Triple-Negative Breast Cancer in Nigeria: A Retrospective Cohort Study

Sowunmi, A.; Agbakwuru, C.; Aje, E.; Kehinde, O.; Andero, T.; Eze, C. G.; Oshikanlu, B.

2026-06-12 oncology 10.64898/2026.06.10.26355358 medRxiv

Top 0.1%

1.7%

Show abstract

Background: Triple-negative breast cancer (TNBC) is an aggressive breast cancer subtype characterized by the absence of estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2 expression. It is associated with limited targeted treatment options, early relapse, and a high propensity for visceral metastasis. Data describing metastatic patterns and treatment characteristics of TNBC in Nigeria remain limited. Methods: This retrospective descriptive cohort study included 869 patients with TNBC managed at the Medserve-LUTH Cancer Center, Lagos University Teaching Hospital, Nigeria between June 2019 and June 2024. Demographic, clinicopathologic, metastatic, and treatment-related data were extracted from electronic medical records. Descriptive statistics were used to summarize patient characteristics, metastatic patterns, and treatment profiles. Associations between metastatic disease and selected clinicopathologic and treatment variables were explored using Pearsons chi-square test. Complete-case analysis was applied throughout. Results: The mean age at presentation was 52.09 {+/-} 12.26 years. Most patients were married (79.1%), postmenopausal (64.3%), and of Yoruba ethnicity (56.8%). Advanced disease predominated, with Stage III and Stage IV disease accounting for 42.9% and 35.6% of cases, respectively. Invasive ductal carcinoma was the most common histologic subtype (77.0%), while Grade II tumours constituted 51.3% of graded cases. Surgery was performed in 73.1% of patients, predominantly mastectomy (70.9% of surgical procedures). Chemotherapy was administered to 83.2% of patients, most commonly anthracycline-based regimens (41.8%), while radiotherapy was delivered to 63.5% of patients, with hypofractionated schedules of 42-43 Gy in 15-16 fractions accounting for 47.2% of radiotherapy courses. Metastatic disease was documented in 32.9% of evaluable patients. Lung metastasis was the most frequent site (62.5%), followed by bone (46.3%), regional lymph node invasion (38.5%), liver (23.0%), and brain (22.6%). Tumour grade and histologic subtype were not significantly associated with metastatic disease, whereas radiotherapy exposure demonstrated a significant association with metastatic status ({chi}{superscript 2} = 10.35, p = 0.001). Conclusion: TNBC in this Nigerian cohort was characterized by advanced-stage presentation, invasive ductal predominance, extensive use of multimodality treatment, and substantial visceral metastatic burden. Lung metastasis was the most common metastatic site. These findings provide contemporary real-world data on TNBC in Nigeria and highlight the continuing need for earlier diagnosis, timely referral, and sustained investment in comprehensive cancer care services.

5

ChooseMyStat: A Web-Based Interactive Tool for Statistical Test Selection and Analysis Plan Generation in Clinical Research

Srivastava, S.; Punyani, S. R.; Vazalwar, D.; Joshi, A.; Pakhare, A. P.

2026-06-03 medical education 10.64898/2026.06.02.26354730 medRxiv

Top 0.1%

1.5%

Show abstract

Background: Postgraduate medical residents frequently face difficulty in selecting appropriate statistical tests and preparing statistical analysis plans (SAPs) for thesis work. Existing resources often identify statistical tests without guiding implementation, reporting or software execution. Aims: To describe the development, features and content validation of ChooseMyStat, a free, open source, web based interactive tool for statistical test selection and SAP text generation in clinical research. Methods: ChooseMyStat was developed as a React based web application using an iterative, AI assisted development process under direct faculty supervision. The tool uses a branching decision algorithm covering 18 inferential statistical tests, two diagnostic accuracy measures, four agreement/reliability statistics, and four descriptive statistics scenarios. For each recommendation, it generates a SAP template paragraph, a results reporting example, step by step JASP instructions, and R code. Content validation was performed using 105 open-access original research articles from 15 broad medical specialties published in Indian journals during 2024 2025. Results: The tool covers commonly used statistical methods, including t tests, ANOVA, chi square variants, non parametric alternatives, correlation, regression (linear, logistic, ordinal), survival analysis, methods for clustered or repeated data, diagnostic accuracy measures, and agreement/reliability statistics. Among 365 statistical tests identified across 105 articles (excluding normality checking procedures), 346 (94.8%) were covered by the tool. Complete coverage of all statistical methods used was observed in 86 of 105 articles (81.9%). Conclusions: ChooseMyStat integrates statistical test selection with implementation guidance, SAP generation, reporting support and software instructions within a single interface. The tool may support postgraduate research training by improving accessibility to applied biostatistics guidance.

6

Retrospective cohort study extracting coexisting background breast-lesion features from stage I-III invasive breast cancer

Lim, R. J. Y.; Nitar, P.; Lau, K. W.; Leong, L. C. H.; Lim, G. H.; Tan, V. K. M.; Tan, B. K. T.; Tan, E. Y.; Goh, S. S. N.; Hartman, M.; Wong, F. Y.; Li, J.; Joint Breast Cancer Registry,

2026-05-22 oncology 10.64898/2026.05.19.26353633 medRxiv

Top 0.2%

1.4%

Show abstract

Background Background breast features are frequently noted in pathology reports alongside invasive breast cancer but rarely factor into prognosis or treatment decisions. Their relationship to tumor characteristics and patient outcomes remains incompletely characterised. Methods We conducted a retrospective cohort study of 7,603 patients with Stage I-III invasive breast cancer (diagnosed 1991-2022, age <80 years) from the Joint Breast Cancer Registry in Singapore. Natural language processing (NLP) was applied to 9,754 free-text pathology reports to extract co-existing background breast features, with accuracy validated by dual-reviewer assessment of 200 reports. Unsupervised hierarchical clustering grouped extracted features into three categories. Associations with tumor characteristics were assessed by multinomial logistic regression, and ten-year overall survival by Cox proportional hazards models (median follow-up 9.6 years; 620 deaths). Results Here we show that NLP-based extraction of background breast features from routine pathology reports achieves an accuracy of over 90% across features. Lobular neoplasia and benign proliferative changes are associated with less aggressive tumor characteristics, whereas early neoplastic and papillary lesions are more prevalent in HER2-enriched and luminal B tumor subtypes. Benign proliferative changes are associated with better survival in age- and year-adjusted models (hazard ratio 0.91, 95% CI 0.86-0.97), but this association is attenuated after adjustment for stage and subtype. Conclusions NLP-enabled extraction of background breast features from pathology text is feasible at scale. These features reflect tumor biology but do not independently add prognostic information beyond established clinical variables.

7

Precision Imaging to Evaluate Kaposi Sarcoma (PRIME-KS): protocol for a multicountry novel artificial intelligence-based imaging device

Odeny, T. A.; Adhiambo, H. F.; Mangale, D.; Makanga, P. K.; Odeny, B.; Okuku, F.; Zhou, C.; Geng, E.; Carson, J.; Mudhune, V.; Bukusi, E.; Semeere, A.

2026-06-04 oncology 10.64898/2026.06.03.26354815 medRxiv

Top 0.2%

1.4%

Show abstract

Abstract Background: Kaposi sarcoma (KS) is the most common cancer among men in several Eastern African countries, yet treatment monitoring relies on imprecise, time-consuming ruler-based measurements defined by the AIDS Clinical Trial Group (ACTG). This method suffers from inter-observer variability, fails to capture lesion height or true geometric area, and performs poorly on dark skin. SkinScan3D (SS3D) is a portable, low-cost, AI-enabled 3D imaging device that provides objective measurements of KS skin lesion area, height, volume, and color. The Precision Imaging to Evaluate Kaposi Sarcoma (PRIME-KS) study evaluates whether SS3D provides more reproducible and accurate lesion measurements than the standard method, and validates its integration into routine clinical workflows in Kenya and Uganda. Methods: PRIME-KS is a multicountry prospective mixed-methods study with two clinical objectives. Objective 1 is a cross-sectional diagnostic accuracy study comparing SS3D with ruler-based measurement in 50 adults with KS (150 lesions) across sites in Kenya and Uganda. Two clinicians independently measure three lesions per participant using both methods. The primary outcomes are concordance correlation coefficient (CCC) for inter-rater reproducibility, and co-efficient of determination for accuracy. Objective 2 is a non-randomized before-and-after pilot study in 100 patients at three sites, evaluating device usability, acceptability, appropriateness, and feasibility using validated instruments, along with time-and-motion studies and activity-based micro-costing. Prior to these clinical objectives, a formative study used focus group discussions, discrete choice experiments, and human-centered design workshops to refine the SS3D device and protocols with end-user input. Discussion: PRIME-KS will provide the first rigorous evaluation of a 3D imaging device for monitoring KS treatment response in routine clinical settings. If SS3D demonstrates superior reproducibility and clinical utility, it could reduce unnecessary chemotherapy exposure and associated toxicities by enabling earlier, more objective assessment of treatment response. Trial registration: ClinicalTrials.gov NCT06898203, registered 27 March 2025. Pan African Clinical Trials Registry PACTR202603523439856. Keywords Kaposi sarcoma, SkinScan3D, 3D imaging, treatment monitoring, diagnostic accuracy, implementation science, usability, human-centered design, Kenya, Uganda

8

Machine-Assisted Topic Analysis of Large-Scale Health Experience Data: Identifying Sociodemographic Differences and Evaluating Bias in Large Language Models

Bondaronek, P.; Ward, E.; Beecham, E.; Zhang, E.; Huang, Y.; Ive, J.; Naughton, F.; Wu, H.; Vindrola-Padros, C.

2026-05-22 public and global health 10.64898/2026.05.20.26353755 medRxiv

Top 0.2%

1.3%

Show abstract

Introduction: Large-scale free-text data with socio-demographic information can capture nuanced accounts of lived experience that are difficult to detect in structured measures. However, manual qualitative analysis is difficult to scale, while automated approaches may obscure subgroup variation or introduce bias. This is especially relevant for large language models (LLMs), whose use in qualitative health research is increasing despite limited evaluation in socio-demographically stratified analysis. Objectives: This study examined how socio-demographic differences in health and wellbeing experiences were manifested in a large-scale free-text dataset, and evaluated how different AI-assisted analytic approaches identified these differences. Specifically, it aimed to: (1) identify socio-demographic differences using Machine-Assisted Topic Analysis (MATA); (2) compare MATA outputs with topic modelling combined with LLM-based topic interpretation; and (3) examine potential bias in LLM-based analysis. Methods: We analysed 2,177 valid free-text responses from the UK COVID-19 Wellbeing Tracker, a longitudinal survey of adults recruited during the pandemic. Responses described factors influencing health behaviours, mood, and wellbeing over time. Data were preprocessed and stratified by gender, age, and socioeconomic status (SES). MATA combined topic modelling, using Latent Dirichlet Allocation, with humanled qualitative interpretation of topic keywords and representative responses. The same topic model outputs were then interpreted using an LLM for comparison. Potential LLM bias was assessed using a demographic label-swap crossover design, with bias evaluated through Jaccard lexical similarity, VADER sentiment, and NRC emotion analysis. Grounded Review and Assessment of Computational Evidence (GRACE) was used to evaluate the AI outputs. Powered by Editorial Manager(R) and ProduXion Manager(R) from Aries Systems Corporation Results: MATA identified meaningful socio-demographic thematic differences in pandemic-related mood and wellbeing across gender, age, and SES. Common themes included disruption, adaptation, uncertainty, routine, and the influence of work, relationships, and health on wellbeing. Male-stratified topics emphasised routines, habits, and coping with external pressures, whereas female-stratified topics were more relational and reflective, focusing on connection, isolation, family wellbeing, and anxiety. Lower SES narratives included practical strain, financial pressure, and loss of control, while higher SES narratives more often reflected adjustment, autonomy, and meaning-making. Older adults described health, gratitude, and family connection, whereas younger adults emphasised work-related stress and competing demands. LLM-based interpretation broadly reproduced the high-level subgroup patterns identified through MATA, but outputs were more generalised, less conceptually differentiated, and showed greater thematic overlap. Bias analysis showed systematic shifts in vocabulary, sentiment, and emotional tone when demographic labels were swapped, suggesting a risk of representational bias. Conclusions: MATA identified meaningful socio-demographic differences while retaining interpretative depth at scale. LLM-based topic interpretation showed utility for rapid thematic summarisation, but produced less conceptually differentiated outputs and was sensitive to demographic framing. The analysis also identified "LLM speak", where outputs appeared coherent but relied on abstract, generalised, and overlapping interpretations. Human oversight, structured qualitative appraisal, and explicit bias evaluation are necessary when using LLMs to analyse socially stratified free-text health data.

9

A Global Health Quality Improvement Project: Enhancing Cervical Cancer Awareness and screening in Nigeria

Umar, I. A.; Shehu, N.; Nagib, N.; Sulley, S.; Idris-Saeed, Z. O.

2026-06-11 public and global health 10.64898/2026.06.09.26355311 medRxiv

Top 0.2%

1.3%

Show abstract

Background Cervical cancer remains a significant global public health challenge, ranking as the fourth most common cancer among women worldwide. According to The World Health Organization (WHO) 604,000 women were diagnosed with cervical cancer globally in 2020, with over 342,000 deaths amongst this group [1]. Despite its high mortality, cervical cancer is largely preventable through early detection and vaccination against human papillomavirus (HPV), which causes nearly all cases of cervical cancer [1,2] In Nigeria, it is the second most common cancer among women in Nigeria and a leading cause of cancer-related deaths, with low screening rates exacerbating late diagnoses and poor outcomes [1]. Despite global commitments to elimination with Pap smear screening and HPV vaccination, less than 10% of women in Nigeria have undergone screening due to misconceptions, stigma, and limited awareness. Educational interventions may improve awareness and promote screening behaviors. This global health quality improvement (QI) project aimed to enhance cervical cancer awareness and increase Pap smear uptake at the Central Bank of Nigeria (CBN) Clinic in Abuja, Nigeria. Methods In November 2024, we conducted a health education intervention at the Central Bank of Nigeria (CBN) through a structured educational session for male and female CBN staff members. The session focused on cervical cancer prevention, risk factors, and screening guidelines. Additionally, cervical cancer awareness was raised via email, social media, and electronic bulletin board. Participants completed pre and post-interventions surveys assessing cervical cancer knowledge across 10 key items and demographic characteristics. Pap smear uptake was assessed using the CBN clinic records for three months before and after the intervention. Institutional approval was obtained from CBN and external institutional review board approval was not required. Results 188 participants attended the health education session with 124 survey responses (70 pre-event, 54 post-event). Participants were mostly women aged 30-39. Post-intervention, eight of ten survey questions showed improved knowledge, with five demonstrating statistically significant gains: understanding Pap smear frequency (p<.001), HPV infection prevention (p=.042), early symptoms of cervical cancer (p=.019), smoking as a risk factor (p=.002), and availability of Pap smears at the CBN clinic (p=.035). Pap smear uptake increased from 5 screenings in three months pre-intervention to 32 screenings in the three months post-intervention. Participants reported that the sessions provided a safe space to ask questions and address cultural myths and misconceptions. Conclusion This QI initiative demonstrates the positive impact of targeted health education in improving awareness and screening uptake. Recommendations include increasing awareness through public health talks, updating clinicians on current guidelines, and removing unnecessary barriers to HPV vaccination. These findings align with global health efforts to reduce cervical cancer mortality and underscore the potential of QI projects to improve health outcomes in resource-limited settings.

10

Cancer Prevalence and Patterns in Kilifi County: A 10-year Retrospective Descriptive Study

Masha, M.; Mbugua, R. W.; Abdullahi, M.; Sheikh, N. A.; Omar, A.; Abdihamid, O.

2026-06-01 oncology 10.64898/2026.05.20.26353643 medRxiv

Top 0.2%

1.3%

Show abstract

Abstract Background Cancer is an increasing public health challenge in Kenya, particularly in rural and underserved regions where surveillance systems and diagnostic capacity remain limited. Kilifi County, located along the Kenyan coast, lacks a population-based cancer registry, and data on the local cancer burden is not available. This study aimed to characterize the demographic distribution of patients, cancer burden in the county, and management of cancer cases diagnosed at Kilifi County Referral Hospital (KCRH) over ten years. Methods This retrospective study analyzed the patterns of cancer in Kilifi County using patient records from KCRH during the study period (January 1, 2014, to January 1, 2024). Results A total of 101 patients with cancer were identified, 58% female, with a mean age of 54 years. Most patients were from Kilifi North (47%), with a high proportion reporting no formal occupation (41%) or farming (26%). Esophageal and cervical cancers were the most common (18% each), followed by breast and prostate cancers (5% each), with other malignancies occurring infrequently. Histopathology was the primary diagnostic modality (88%). Staging data were incomplete in 70% of cases; among documented cases, the majority presented with advanced disease (21% stage IV). Due to limited local treatment capacity, approximately half of the patients were referred to tertiary centers for chemotherapy, radiotherapy, or surgery. At data cut-off, 43% had died, 25% were on treatment, and 29% were lost to follow-up, with only 2% completing treatment or under follow-up. Conclusions This study demonstrates a substantial cancer burden in Kilifi County and highlights critical gaps in diagnostic capacity, staging, and continuity of care. Strengthening cancer surveillance systems, expanding diagnostic and treatment infrastructure, and establishing a population-based cancer registry are essential to improving cancer outcomes and advancing equitable care in rural Kenya

11

Direct and mediated effects (DME) SLCMA: a novel method for life course modelling with time-varying covariates

Beer, S.; Simpkin, A. J.; Eldeeb, S. Y.; Zar, H. J.; Stein, D. J.; Dunn, E. C.; Smith, A. D. A. C.

2026-06-06 epidemiology 10.64898/2026.05.29.26354427 medRxiv

Top 0.2%

1.2%

Show abstract

Background: In prospective cohort studies, where an exposure is collected repeatedly, interest often lies in determining whether the timing of that exposure has a differential effect on a later outcome. The Structured Life Course Modeling Approach (SLCMA), where users select between temporal hypotheses of exposure specified a priori, provides one way to analyse such longitudinal data. However, few studies using SLCMA consider the effect of time-varying covariates (TVC) which may impact associations. Methods: We present a modified version of the SLCMA - called direct and mediated effects (DME)-SLCMA - which corrects for TVC. We first develop the DME-SLCMA method, test it through simulation, and apply it to psychosocial data from the Drakenstein Child Health Study (DCHS, n=336) to investigate relationships between maternal psychopathology, TVC of socioeconomic status, and offspring depressive symptoms. Results: We found that, on average, offspring depressive symptoms score increased by 3.9% (95% CI: 1.0%-6.9%, p = 0.039) for each unit of maternal psychopathology (SRQ) at 48 months whilst adjusting for time-varying socioeconomic status (at 18, 30, 42 and 54 months). Our simulations identified several realistic scenarios where selections ignoring TVC - with TVC mediated exposure effects present - were prone to be incorrect, including our DCHS example. Conclusion: DME-SLCMA is a robust new approach for life course modelling in the presence of time-varying covariates. We recommend adjusting for TVC whenever possible, and, when not possible, our simulation study identified that scenarios where mediated effects are comparable, or greater, in magnitude to direct effects are most prone to confounding.

12

CBCRisk-Mastectomy: A Risk Prediction Tool to Aid Contralateral Prophylactic Mastectomy Decision Making

Sajal, I. H.; Pfeiffer, R. M.; Jatoi, I.; Gail, M. H.; Cecchini, R. S.; Choudhary, P. K.; Biswas, S.

2026-05-15 surgery 10.64898/2026.05.12.26352924 medRxiv

Top 0.3%

1.0%

Show abstract

Purpose: Unilateral breast cancer (BC) patients scheduled for mastectomy often choose to undergo contralateral prophylactic mastectomy (CPM), despite substantial declines in contralateral breast cancer (CBC) risk in recent decades. Models predicting absolute risk of future CBC can aid informed decision-making about CPM. CBCRisk is an existing CBC absolute risk prediction model trained on unilateral BC patients regardless of whether they had mastectomy. Here we developed CBCRisk-Mastectomy, tailored specifically to BC patients scheduled for mastectomy and considering CPM. Patients and Methods: We used data on BC patients who underwent mastectomy to treat their first BC from two nationally representative sources: Breast Cancer Surveillance Consortium (BCSC) and Surveillance, Epidemiology, and End Results (SEER) cancer registry. We imputed missing data in the BCSC sample and used conditional logistic regression models, trained on 2,660 BC patients (665 CBC cases) from BCSC, to identify predictors and estimate relative risks (RRs). These were combined with attributable risks and CBC incidence rates estimated from SEER to obtain absolute risk. Cross-validation was used to internally validate CBCRisk-Mastectomy and compare with CBCRisk. Results: CBCRisk-Mastectomy has nine predictors: first BC type, lobular carcinoma in situ status, estrogen receptor status, tumor stage, breast density, age at BC diagnosis, family history of BC, age at first birth, and body mass index. The areas under the curve and their 95% confidence intervals for 5-year predictions for CBCRisk-Mastectomy and CBCRisk were 0.62 (0.59, 0.65) and 0.58 (0.55, 0.61), respectively. Conclusions: CBCRisk-Mastectomy may aid clinicians in counseling BC patients scheduled for mastectomy, enabling improved decision-making regarding CPM.

13

A comparison of scalable approaches for the pairwise analysis of large pathogen genomic and spatial datasets: an application to studying Mycobacterium tuberculosis transmission

Lan, Y.; Wu, C.-Y.; Lin, H.-H.; Cohen, T.; Warren, J. L.

2026-05-21 microbiology 10.64898/2026.05.21.726848 medRxiv

Top 0.3%

0.9%

Show abstract

Pairwise analysis of genomic and spatial data offers opportunities to identify and estimate the associations between covariates and the transmission of pathogens between individuals. However, such pairwise analyses are computationally intensive, and may not be feasible to conduct given the high dyad count in even moderately sized datasets. Here we compare two approaches to increase the efficiency of pairwise analysis for large datasets. We quantify and compare the performance of divide-and-conquer Bayesian model fitting and pairwise case-control approaches for estimating associations between individual- and pair-level covariates and shared membership in a transmission cluster. We utilize a large dataset (n=4,154) of spatially-referenced, genomically-sequenced Mycobacterium tuberculosis isolates collected from a single city for this analysis. We find that the case-control approach produces unbiased estimates of effect sizes with expected credible interval coverage and is more robust than the divide-and-conquer method when effect sizes are large. Thus, we recommend using the case-control approach with at least three controls per case to downscale datasets for pairwise analysis when analysis of the entire dataset is not possible. This approach mitigates the computational challenges of pairwise Bayesian modeling on datasets that require significant computational resources while maintaining desired inferential properties. Author SummaryPairwise analyses of large datasets to study pathogen transmission are computationally demanding because they typically require simultaneous analysis of each possible pair of individuals in a dataset; as datasets become larger these analyses often are not feasible to conduct even with access to high-performance computing resources. In this work, we compare a case-control approach and divide-and-conquer approaches for more efficient pairwise analysis of large datasets. Using a large dataset of Mycobacterium tuberculosis isolates including genetic and spatial data, we investigate the performance of each method for estimating the associations between host covariates and genetic clustering of isolates. We find that the case-control approach is generally preferred over methods which first divide the data into subsets and then combine results. While additional extensions of these analyses are needed to test the generality of these findings to other data settings, this work provides a practical way forward for the pairwise analysis of large datasets to study pathogen transmission.

14

Long-read sequencing reveals transposable element-derived chimeric transcripts at zygotic genome activation in mammalian embryos

Kawakami, S.; Kitao, K.; Ikeda, S.; Honda, S.

2026-05-28 developmental biology 10.64898/2026.05.25.727629 medRxiv

Top 0.4%

0.9%

Show abstract

BackgroundTransposable elements (TEs) are mobile genomic sequences that constitute one-third to one-half of the mammalian genome. Recently, TEs have been recognized for their important roles as cis-regulatory elements. TEs are broadly activated during zygotic genome activation (ZGA) in mammalian embryos, where they function as alternative promoters of host genes and drive the transcription of chimeric transcripts. However, the construction of comprehensive chimeric transcript databases based on short-read sequencing remains limited due to the repetitive and abundant nature of TEs in the genome. Here, we used long-read RNA sequencing to construct a comprehensive dataset of chimeric transcripts expressed in ZGA mouse and bovine embryos. ResultsWe identified 11,996 and 4,755 chimeric transcripts variants derived from 2,695 and 1,200 host genes in mouse and bovine, respectively, exceeding the numbers reported in previous short-read-based studies. Among them, 114 orthologous pairs produced chimeric transcripts in both species. Gene Ontology analysis revealed significant enrichment of terms related to transcriptional regulation and protein modification in mouse, whereas no terms were significantly enriched in bovine. Assessment of the protein-coding potential of the TE-driven transcripts using predicted open reading frames (ORFs) revealed that the proportion of "Protein-coding" transcripts was lower, whereas that of "LncRNA" (long non-coding RNA) was higher compared with all transcripts in both species. Among the ORFs classified as "Protein-coding", comparison with canonical ORFs revealed a tendency for the N terminus to be truncated while the C terminus remained intact in both species. TE-derived promoters used in mouse were enriched for mouse-specific TEs, whereas those in bovine were enriched for older TEs conserved among eutherians. In addition, long-read sequencing detected a greater number and proportion of TEs used as promoters in mouse and bovine than short-read sequencing. Although motif analysis identified KLF5 and OTX2 binding sites upstream of TE-derived promoters in both species, the specific TEs containing these motifs differed between the two species. ConclusionsThis study presents the first long-read sequencing analysis of chimeric transcripts in mammalian embryos in two species. Our approach revealed the functional similarities of chimeric transcripts between species, as well as species-specific differences in their TE compositions.

15

Targeted BRCA1/BRCA2 Sequencing in a Bangladeshi Clinically Referred Cohort Identifies Candidate BRCA1 Loss-of-Function Variants and a Multi-Exon Deletion-Like CNV Signal

Al Sium, S. M.; Banu, T. A.; Goswami, B.; Naser, S. R.; Habib, M. A.; Akter, S.; Ara, M. H.; Al Din, S. M. S.; Nafisa, A.; Nayem, M. R.; Rabbi, M. F. A.; Sarkar, M. M. H.; Khan, M. S.

2026-05-20 oncology 10.64898/2026.05.11.26352643 medRxiv

Top 0.4%

0.9%

Show abstract

Background: Population-relevant BRCA1/BRCA2 data from Bangladesh are scarce, creating challenges for hereditary breast and ovarian cancer variant interpretation, counseling, and follow-up testing. We examined a clinically referred Bangladeshi cohort to characterize assay-derived BRCA1/BRCA2 short variants, sequencing-depth performance, and copy-number findings in a conservative pilot framework. Methods: Twenty-three de-identified blood-derived DNA samples were assessed using a targeted BRCA1/BRCA2 next-generation sequencing workflow. Downstream analysis used assay-generated short-variant, coverage, and CNV outputs, with coordinates reported on hg19/GRCh37. Short variants were evaluated from high-confidence PASS/VCC-H calls, and CNV review incorporated both target-region and amplicon-level copy-number patterns. Results: After removal of four low-VAF review observations, the primary germline-compatible dataset comprised 304 short-variant observations representing 34 unique variants. Both BRCA1 and BRCA2 contributed comparable variant burdens, while the overall profile was mainly composed of missense and synonymous changes. Six sample-specific heterozygous BRCA1 truncating candidates were observed, including five frameshift variants and one stop-gain variant. Protein-level mapping placed these events across the central-to-C-terminal portion of BRCA1. Sequencing depth was consistently high across the targeted regions, with all 4,255 amplicon-sample measurements exceeding 280x and 99.91% reaching at least 500x. Copy-number analysis highlighted one candidate BRCA1 multi-exon deletion-like event involving exons 15-20 in BCSIR-BRCA-21, with unresolved partial exon 14 involvement. Conclusions: This study provides an initial Bangladesh-focused targeted BRCA1/BRCA2 dataset and identifies candidate short-variant and CNV findings for validation. These findings should be interpreted as analytical candidates only and require confirmatory testing and expert clinical curation before any clinical application. The cohort is referral-enriched and should not be used to infer population prevalence.

16

Prognostic performance of an AI-based recurrence risk model in clinically low-risk HR+/HER2- early breast cancer

Tang, C.; Biswas, D.; Liu, C.; Zeng, K.; Geras, K. J.; Witowski, J.; Meurs, C.; Westenend, P. J.

2026-06-03 oncology 10.64898/2026.06.02.26354233 medRxiv

Top 0.4%

0.8%

Show abstract

Objective Accurate prognostication of recurrence risk in HR+/HER2- early breast cancer is central for therapeutic decision-making, including identifying patients who may safely avoid adjuvant systemic therapy. However, the performance of existing prognostic tools remains insufficient for effective clinical stratification, motivating the development of artificial intelligence (AI)-based methods to improve risk stratification. Methods Ataraxis Breast CTX (ATX) is a multi-modal AI test that integrates H&E-stained whole-slide images with clinicopathologic features to predict risk of recurrence for individual patients. This study aims to validate ATX in an external dataset enriched for clinically low-risk patients from Dordrecht, the Netherlands. ATX scores were generated for 892 women diagnosed with early HR+/HER2- breast cancer. Of the 892 patients, 299 did not receive adjuvant systemic therapy. The discriminative performance of ATX was assessed using C-index and its stratification ability was evaluated by log-rank tests comparing Kaplan-Meier survival curves across risk groups. Results ATX achieved a C-index of 0.71 and a 5-year time-dependent AUC of 0.71, demonstrating strong discrimination in predicting recurrence-free survival (RFS). Among 299 patients who received no adjuvant therapy, ATX achieved a C-index and time-dependent AUC of 0.78 and 0.81 respectively, suggesting ATX retains prognostic information in the absence of systemic therapy. ATX scores were used to stratify patients into risk groups using a pre-specified threshold, where 656 (74%) were classified as ATX low-risk and 236 (26%) were classified as high-risk. Notably, untreated and treated ATX low-risk patients had comparable 5-year RFS (untreated: 5-year RFS = 96%, 95% CI = 92-97%; treated: 5-year RFS = 96%, 95% CI = 93-97%) with near identical 10-year RFS (86%, 95% CI = 83-92% for both), suggesting ATX low-risk status may identify a subgroup with favorable prognosis independent of treatment exposure. Conclusion ATX provides robust prognostic stratification in an external cohort of clinically low-risk HR+/HER2- early breast cancer and identifies a subgroup of patients who did not receive systemic therapy with favorable observed outcomes. These results support prospective validation of ATX as a decision-support tool for adjuvant therapy de-escalation in HR+/HER2- early breast cancer.

17

Breast cancer over-diagnosis due to mammography screening - A long-term follow-up population study of BreastScreen Norway

Heggland, T.; Vatten, L. J.; Opdahl, S.; Weedon-Fekjaer, H.

2026-06-03 epidemiology 10.64898/2026.06.02.26354696 medRxiv

Top 0.5%

0.8%

Show abstract

Objectives Estimates of breast cancer over-diagnosis related to mammography screening varies substantially. Over-diagnosis is commonly defined as cases that would not have been detected during the persons remaining lifetime in the absence of screening. We here aim to quantify over-diagnosis in the population-based BreastScreen Norway mammography screening program using long-term follow-up and more detailed modeling than previous studies. Setting We applied data on Norwegian screening patterns and breast carcinoma incidence for the period 1987-2019, covering women aged 49-84 years, leveraging the gradual implementation of the organized biennial BreastScreen Norway screening program for women aged 50-69 during 1995-2005. Methods Using an extended age-period-cohort model, we estimated excess lifetime risk of invasive breast cancer and ductal carcinoma in situ in the presence of program screening, as an indicator of over-diagnosis among screen-detected cases. Results Lifetime risk of breast carcinomas was 6.6% (95% confidence interval 2.5% to 10.7%) higher for invited than for non-invited women. This indicates that 18% (95% confidence interval 7.3% to 28.0%) of screen-detected cases may be over-diagnosed, and that approximately one in 86 (95% confidence interval 54 to 210) screened women were over-diagnosed during their screening period. Using effect estimates from previous studies, we estimated that approximately three women are over-diagnosed for every breast cancer death prevented by screening, and that 87% of over-diagnosed tumors might grow extremely slowly. Conclusions Over-diagnosis related to mammography screening is a considerable problem, but its extent may be smaller than reported in some previous studies. Most over-diagnosed tumors likely grow very slowly.

18

Translation and Cross-cultural Validation of Leprosy Case Detection Delay Questionnaire Among Persons Affected by Leprosy in Southeast Nigeria

Eze, C. C.; Murphy-Okpala, N. N.; Ekeke, N.; Nwafor, C.; Egbule, D.; Njoku, M.; Ezeakile, O.; Meka, A.; Iyama, F. S.; Ogbuefi, E.; Ugwu, O.; Solomon, M.; Adesigbin, C.; Chukwu, J.

2026-06-09 public and global health 10.64898/2026.06.06.26355058 medRxiv

Top 0.5%

0.8%

Show abstract

Introduction Reducing delays in leprosy case detection is essential for achieving global leprosy targets. Accurate measurement of these delays and their determinants relies largely on patient-reported data, as routine health records are often inadequate. The leprosy case detection delay (CDD) questionnaire, developed under the Post Exposure Prophylaxis for Leprosy (PEP4LEP) project, has been validated in Ethiopia, Mozambique, Tanzania, and Indonesia. However, it has not been adapted or validated for Nigeria or any major Nigerian indigenous language. This study aimed to culturally adapt and validate the CDD questionnaire for Igbo-speaking populations in Nigeria. Methodology/Principal Findings The CDD questionnaire underwent a standardized cross-cultural adaptation process. Content validity was assessed using item- and scale-level content validity indices, while construct validity was evaluated through hypothesis testing. Reproducibility was assessed using test-retest and inter-rater reliability; agreement using the Bland-Altman method and the Wilcoxon Signed-Rank test; reliability using Spearmans rank correlation coefficient and the Intraclass Correlation Coefficient (ICC); and internal consistency using Cronbachs alpha. Data were collected through face-to-face interviews with persons affected by leprosy at two time points separated by at least two weeks. Participants (n=100) had a mean age of 45.1 years (SD=18.7). Mean CDD was 77.2 months at baseline and 77.9 months at retest. The instrument demonstrated excellent content validity (I-CVI/S-CVI: 0.90-1.00), good internal consistency (Cronbachs =0.77), and excellent test-retest reliability (ICC=0.996, 95% CI: 0.994-0.997). Test and retest measurements were highly correlated ({rho}=0.985, p<0.001), with no evidence of systematic change over time (p=0.864). Seventy-two percent of participants reported identical CDD values across assessments. All items from the original English version were retained without modification. Conclusion/Significance The Igbo version of the CDD questionnaire demonstrated good validity and reliability and is suitable for assessing leprosy case detection delay among Igbo-speaking populations in Nigeria

19

Ambient AI Documentation in Mixed-Language Encounters: A Heuristic Evaluation of Spanish-English and Mandarin-English Conversations

Hu, D.; Flores, D.; Flores, L.; Chien, R.; Lam, K.; Chow, E.; Guo, Y.; Tam, S.; Perret, D.; Pandita, D.; Zheng, K.

2026-05-22 health informatics 10.64898/2026.05.19.26353603 medRxiv

Top 0.5%

0.8%

Show abstract

Ambient AI documentation systems rely on automatic speech recognition to transcribe patient-provider conversations before generating clinical notes. However, little empirical evidence exists on how these systems perform in mixed-language clinical encounters. We conducted a mixed-method heuristic evaluation of an ambient AI documentation tool using 24 reenacted primary care conversations involving Spanish-English and Mandarin-English code-switching. Quantitative analyses measured mixed error rate (MER) and code-switching detection. Overall MER was low, with a median of 4% and less variation in Spanish-English conversations, and 9% in Mandarin-English conversations, but with outliers reaching 67%. The system generally detected language switches reliably, although deletions occurred frequently in Mandarin-English transcripts at switch points. Qualitative analysis revealed transcription errors related to phonetic similarity, automatic language translation, clinical terminology recognition, and language-specific challenges. These findings highlight considerations for improving ambient AI clinical documentation systems to support multilingual providers in delivering care for linguistically diverse populations.

20

Prevalence of nutritional, behavioral and anthropometric cancer-related risk factors among adults in Nouakchott, Mauritania: a cross-sectional study

Tolba, N.; Najdi, A.; El Hfid, M.; Hmeied Maham, M.; Brahim, S. M.; Tolba, A.; Sellal, N.

2026-05-26 epidemiology 10.64898/2026.05.23.26353924 medRxiv

Top 0.6%

0.7%

Show abstract

Background Cancer is a growing public health challenge in low- and middle-income countries, where urbanization, nutritional transition and lifestyle changes contribute to modifiable risk factors. In Mauritania, population-based data on cancer-related nutritional, behavioral and anthropometric risk factors remain limited. Objective To describe the frequency of the main nutritional, behavioral and anthropometric cancer-related risk factors among adults living in the three wilayas of Nouakchott. Methods A cross-sectional study was conducted among 1,000 adults aged 18 years and older in Nouakchott. Data were collected using a standardized questionnaire covering sociodemographic characteristics, dietary habits, physical activity and selected health behaviors. Anthropometric measurements were performed to assess body mass index and abdominal adiposity. Abdominal obesity was defined using sex-specific waist circumference cut-off points recommended by the World Health Organization: [≥] 88 cm in women and [≥] 102 cm in men. Results were presented as frequencies and proportions, with comparisons by sex, age group and wilaya of residence. Results Women represented 52.0% of participants, and 53.5% were aged 18-34 years. Excess body weight was frequent, with 38.6% overweight and 28.0% obese. Abdominal adiposity was also common, with 58.0% having increased or substantially increased waist circumference and 48.3% having an elevated waist-to-hip ratio. Physical inactivity was reported by 64.7% of participants, and 15.7% were current smokers. Dietary exposures included high red meat consumption in 66.8%, daily refined cereal intake in 67.5%, daily sugar-sweetened beverage consumption in 14.9%, and limited daily fresh fruit consumption in 13.8%. Significant differences were observed by sex for anthropometric indicators, by age for selected dietary habits, and by wilaya for physical activity, smoking and selected dietary behaviors. Conclusion This study shows a high frequency of modifiable cancer-related risk factors among adults in Nouakchott, particularly excess body weight, abdominal adiposity, physical inactivity and unfavorable dietary habits. These findings support the need to strengthen primary prevention strategies targeting nutrition, physical activity and tobacco control in Mauritania.